From John.Rose at Sun.COM Sat Feb 2 05:17:14 2008 From: John.Rose at Sun.COM (John Rose) Date: Sat, 02 Feb 2008 05:17:14 -0800 Subject: FYI: blog on Da Vinci Machine vs. Microsoft DLR/CLR Message-ID: <30C549F7-B1B1-4C22-90DF-BE433842AB52@Sun.COM> I spent the early week learning about the competition. I hope you enjoy my notes: http://blogs.sun.com/jrose/entry/bravo_for_the_dynamic_runtime Best wishes, -- John From John.Rose at Sun.COM Sun Feb 24 01:01:43 2008 From: John.Rose at Sun.COM (John Rose) Date: Sun, 24 Feb 2008 01:01:43 -0800 Subject: review request (M): 6667042 PrintAssembly option does not work without special plugin Message-ID: <2CE69155-1EE2-45B4-BF7F-65C5C6CD5257@sun.com> For putback to http://hg.openjdk.java.net/jdk7/hotspot-comp-gate/hotspot 6667042: PrintAssembly option does not work without special plugin Summary: remove old private plugin interface, simplify, rework old plugin to use unchanged Gnu sources http://webrev.invokedynamic.info/jrose/6667042 -- John P.S. More details: The following HotSpot options (with -XX:) require an externally loadable disassembler module: +PrintAssembly print assembly code for bytecoded and native methods +PrintNMethods print nmethods as they are generated +PrintNativeNMethods print native method wrappers as they are generated +PrintSignatureHandlers print native method signature handlers +PrintAdapterHandlers print adapters (i2c, c2i) as they are generated +PrintStubCode print stubs: deopt, uncommon trap, exception, safepoint, runtime support +PrintInterpreter print interpreter code The original module ("disassembler.so") was dynamically loaded and called via a private C++ interface. It was implemented in the late 1990's using an early version of the Gnu disassembler, gdb-4.16. Three factors made this age poorly. * The interface (C++ ABI) required close coupling between the JVM build toolchain and the disassembler. * The implementation of the disassembler was produced by forking a copy of binutils and editing it. * The implementation was not released to open source. It has been replaced by a plugin, also in a shared library, called via an ANSI C interface. An implementation of this interface on top of the current Gnu binutils is released as open source. It may be implemented in other ways also. Changes to JVM command line interface: * Enable these these options under UnlockDiagnosticVMOptions in product mode. * Make the CompileCommand 'print' use the external disassembler, if present. * -XX:CompileCommand='option',$method,'PrintOptoAssembly' produces the old print command output. * -XX:CompileCommand='option',$method,'PrintNMethods' continues to work The DisassemblerEnv supplied a number of callbacks, but only two were significant, print and print_address. From Vladimir.Kozlov at Sun.COM Wed Feb 27 15:28:24 2008 From: Vladimir.Kozlov at Sun.COM (Vladimir Kozlov) Date: Wed, 27 Feb 2008 15:28:24 -0800 Subject: Request for reviews (S): 6590177, 6667573, 6667580, 6667581, 6667588, 6667595, 6667618 Message-ID: <47C5F218.1040902@sun.com> Resending to openjdk. I fixed changes for 6667580 after Chuck's review. And I redid changes (now smaller and one file) for 6590177. Thanks, Vladimir -------------------------------------------------------------------------------------------- http://webrev.invokedynamic.info/kvn/6590177/index.html Fixed 6590177: jck60019 test assert(!repeated,"do not walk merges twice") Problem: In the current code a store node could be processed before and removed from worklist_store but it will be marked in should_not_repeat vector set. Next time we process the same store node it will be not on worklist_store but in should_not_repeat and it will cause the assert. With EA this assert was hit more times since EA added instances memory slices which could create more complex memory graph. Solution: Use additional worklist for all visited mergemem nodes to avoid processing them twice. The vectorset should_not_repeat and the assert were removed since they duplicate the functionality of the new worklist. -------------------------------------------------------------------------------------------- http://webrev.invokedynamic.info/kvn/6667573/index.html Fixed 6667573: The method set_req_X() should not be used during Parse Problem: AddPNode::Ideal() uses set_req_X() to update inputs during all phases when others Ideal() methods only use it for IGVN. It cases a problem in EA during split unique types since it removes nodes which wasn't fully constructed. Solution: Uses set_req_X() only for IGVN. -------------------------------------------------------------------------------------------- http://awebrev.invokedynamic.info/kvn/6667580/index.html Fixed 6667580: Optimize CmpP for allocations Problem: jvm2008 'serial' benchmark has code which calls String.equals(str1, str2) for 2 strings produced by StringBuilder.toString(). The first check in String.equals() could be optimized out since during the compilation due to inlining we know that it compares 2 newly allocated strings. Solution: Optimize out CmpP if it compares new local objects or a new local object and an input parameter. -------------------------------------------------------------------------------------------- http://webrev.invokedynamic.info/kvn/6667581/index.html Fixed 6667581: Don't generate initialization (by 0) code for arrays with size 0 Problem: Some arrays are created with 0 size know during compilation. We should avoid generation of initialization (by 0) code for such arrays. Solution: Don't generate initialization (by 0) code for arrays with size 0 -------------------------------------------------------------------------------------------- http://webrev.invokedynamic.info/kvn/6667588/index.html Fixed 6667588: Don't generate duplicated CMP for float/double values Problem: C2 has optimization which replaces diamond graphs with CMove (method PhaseIdealLoop::conditional_move()). Each CMove instruction has own Cmp instruction to produce condition flags. But float/double Cmp is expensive and it would be better to use a diamond graph if we have more then one move. Solution: Don't generate float CMove instruction if there more then one move depending on the condition. -------------------------------------------------------------------------------------------- http://webrev.invokedynamic.info/kvn/6667595/index.html Fixed 6667595: Set probability FAIR for pre-, post- loops and ALWAYS for main loop Problem: C2 clone original loop's probability when it constructs pre-, main, post- loops. Then C2 adds zero-trip guard for main loop with probability FAIR. As result a main loop's probability could be half of a pre-loop's probability and RA may generates stack spills in the main loop and not in the pre-loop. Solution: Set probability FAIR for pre-, post- loops and ALWAYS for zero-trip guard for main loop assuming a main loop is executed always. -------------------------------------------------------------------------------------------- http://webrev.invokedynamic.info/kvn/6667618/index.html Fixed 6667618: Disable LoadL->ConvL2I ==> LoadI optimization Problem: LoadL->ConvL2I ==> LoadI optimization causes the problem for an object initialization after InitializeNode was added in C2 since LoadI will not match raw StoreL captured by InitializeNode. As result LoadI node could be scheduled above the initialize StoreL node. Solution: 1. Disable LoadL->ConvL2I ==> LoadI optimization. 2. Remove code in MemNode::Ideal_common() which allows a Load node bypass initialize Store node aftermacro expansion of Initialize node. From Vladimir.Kozlov at Sun.COM Thu Feb 28 10:59:58 2008 From: Vladimir.Kozlov at Sun.COM (Vladimir Kozlov) Date: Thu, 28 Feb 2008 10:59:58 -0800 Subject: Request for reviews (S): 6590177, 6667573, 6667580, 6667581, 6667588, 6667595, 6667618 In-Reply-To: <47C5F218.1040902@sun.com> References: <47C5F218.1040902@sun.com> Message-ID: <47C704AE.8040607@sun.com> Thank you all for reviews I will start pushing today changesets which were approved. I talked with John C. yesterday and he is fine with it. Thanks, Vladimir From Vladimir.Kozlov at Sun.COM Thu Feb 28 14:31:56 2008 From: Vladimir.Kozlov at Sun.COM (Vladimir Kozlov) Date: Thu, 28 Feb 2008 14:31:56 -0800 Subject: Request for reviews (S): 6667595 In-Reply-To: <47C60390.1000408@sun.com> References: <47C5F218.1040902@sun.com> <47C60390.1000408@sun.com> Message-ID: <47C7365C.5090300@sun.com> I will suspend the 6667595 changes (see at the bottom) push for further investigation since it inroduced the regression in scimark.sparse, jvm98.compress, jvm98.mpegaudio. Without it the rest of changes are fine. Thanks, Vladimir % rwcompare -r ref.server_5pb0_m11 ref.server_5pb0_6667595 ============================================================================ ref.server_5pb0_m11 Benchmark Samples Mean Stdev jetstream 20 133.45 14.54 Copy 20 107.95 52.44 Parse 20 115.25 0.55 Read 20 18.20 0.52 Write 20 162.55 18.26 scimark 20 628.57 0.75 Sparse 20 491.83 1.19 LU 20 1266.50 2.60 SOR 20 1040.44 1.69 FFT 20 35.32 0.03 Monte 20 308.76 0.08 specjvm98 20 582.67 2.70 javac 20 406.57 3.60 db 20 321.79 2.20 jess 20 626.98 3.41 jack 20 465.16 5.81 compress 20 501.72 8.59 mtrt 20 1596.34 19.49 mpegaudio 20 746.32 1.80 volano25 20 175659.75 2567.56 connections 20 400.00 0.00 time 20 4.56 0.07 ============================================================================ ref.server_5pb0_6667595 Benchmark Samples Mean Stdev %Diff P Significant jetstream 20 132.99 15.36 -0.35 0.923 * Copy 20 114.20 53.06 -5.79 0.710 * Parse 20 114.85 0.49 0.35 0.020 * Read 20 18.00 0.46 1.10 0.207 * Write 20 157.95 17.77 2.83 0.424 * scimark 20 627.41 0.54 -0.18 0.000 Yes Sparse 20 484.32 0.20 -1.53 0.000 Yes LU 20 1268.91 2.34 0.19 0.004 Yes SOR 20 1039.75 1.72 -0.07 0.209 * FFT 20 35.32 0.03 0.00 0.996 * Monte 20 308.77 0.08 0.00 0.864 * specjvm98 20 581.29 1.62 -0.24 0.059 * javac 20 405.99 3.79 -0.14 0.623 * db 20 321.75 2.31 -0.01 0.956 * jess 20 628.42 2.32 0.23 0.127 * jack 20 464.62 5.79 -0.12 0.768 * compress 20 494.36 1.97 -1.47 0.001 Yes mtrt 20 1612.23 16.80 0.99 0.009 Yes mpegaudio 20 737.84 1.82 -1.14 0.000 Yes volano25 20 175913.16 1695.56 0.14 0.715 * connections 20 400.00 0.00 0.00 0.000 Yes time 20 4.55 0.04 0.16 0.697 * ============================================================================ % rwcompare -r ref.server_5pb0_m11 ref.server_5pb0_no_6667595 ============================================================================ ref.server_5pb0_m11 Benchmark Samples Mean Stdev jetstream 20 133.45 14.54 Copy 20 107.95 52.44 Parse 20 115.25 0.55 Read 20 18.20 0.52 Write 20 162.55 18.26 scimark 20 628.57 0.75 Sparse 20 491.83 1.19 LU 20 1266.50 2.60 SOR 20 1040.44 1.69 FFT 20 35.32 0.03 Monte 20 308.76 0.08 specjvm98 20 582.67 2.70 javac 20 406.57 3.60 db 20 321.79 2.20 jess 20 626.98 3.41 jack 20 465.16 5.81 compress 20 501.72 8.59 mtrt 20 1596.34 19.49 mpegaudio 20 746.32 1.80 volano25 20 175659.75 2567.56 connections 20 400.00 0.00 time 20 4.56 0.07 ============================================================================ ref.server_5pb0_no_6667595 Benchmark Samples Mean Stdev %Diff P Significant jetstream 20 131.83 17.43 -1.21 0.751 * Copy 20 122.80 54.19 -13.76 0.384 * Parse 20 115.05 0.60 0.17 0.281 * Read 20 17.95 0.69 1.37 0.203 * Write 20 152.35 11.45 6.27 0.042 * scimark 20 628.91 0.63 0.05 0.124 * Sparse 20 492.13 0.28 0.06 0.277 * LU 20 1267.55 2.02 0.08 0.162 * SOR 20 1040.78 1.74 0.03 0.540 * FFT 20 35.33 0.02 0.02 0.316 * Monte 20 308.78 0.06 0.01 0.445 * specjvm98 20 583.30 2.55 0.11 0.454 * javac 20 406.16 4.43 -0.10 0.749 * db 20 321.39 2.53 -0.13 0.593 * jess 20 626.66 2.82 -0.05 0.751 * jack 20 467.51 4.96 0.50 0.178 * compress 20 501.30 6.51 -0.08 0.861 * mtrt 20 1606.65 19.31 0.65 0.101 * mpegaudio 20 746.07 2.02 -0.03 0.679 * volano25 20 175220.75 2102.28 -0.25 0.558 * connections 20 400.00 0.00 0.00 0.000 Yes time 20 4.57 0.06 -0.24 0.573 * ============================================================================ Tom Rodriguez wrote: > > 6667595: looks ok. That seems like it will have a lot of impact on the > generated code. have you benchmarked that change by itself? > > Vladimir Kozlov wrote: >> -------------------------------------------------------------------------------------------- >> >> http://webrev.invokedynamic.info/kvn/6667595/index.html >> >> Fixed 6667595: Set probability FAIR for pre-, post- loops and ALWAYS >> for main loop >> >> Problem: >> C2 clone original loop's probability when it constructs pre-, main, >> post- loops. >> Then C2 adds zero-trip guard for main loop with probability FAIR. >> As result a main loop's probability could be half of a pre-loop's >> probability >> and RA may generates stack spills in the main loop and not in the >> pre-loop. >> >> Solution: >> Set probability FAIR for pre-, post- loops and ALWAYS for zero-trip >> guard for main loop >> assuming a main loop is executed always. >> From Vladimir.Kozlov at Sun.COM Thu Feb 28 18:34:10 2008 From: Vladimir.Kozlov at Sun.COM (Vladimir Kozlov) Date: Thu, 28 Feb 2008 18:34:10 -0800 Subject: Request for reviews first part EA update (M): Message-ID: <47C76F22.2080803@sun.com> Here are first changes for Escape Analysis. This is a preparation for the EA scalar replacement optimization and problems fixes. It passed PRT with EA switched off (current default). I will run tonight with EA on. Thanks, Vladimir -------------------------------------------------------------------------------------------- http://webrev.invokedynamic.info/kvn/6667605/index.html Fixed 6667605: Escape Analysis: always inline java constructors with EA Problem: A java constructor should be inlined to scalar replace the allocation with EA. Solution: Inline Java constructors when EA is on. Also replace duplicated code in PhaseGVN::transform() with the call to transform_no_reclaim(n) (I want to do this for several years already :) ). -------------------------------------------------------------------------------------------- http://webrev.invokedynamic.info/kvn/6667610/index.html Fixed 6667610: Escape Analysis: retry compilation without EA if it fails Problem: During split unique types EA could exceed nodes limit and fail the method compilation. Solution: Retry the compilation without EA. -------------------------------------------------------------------------------------------- http://webrev.invokedynamic.info/kvn/6667612/index.html Fixed 6667612: Escape Analysis: disable loop cloning if it has a scalar replaceable allocation Problem: Cloning an allocation will not allow scalar replacement since memory operations could not be associated with one allocation, the unique type data is invalid after such cloning. Solution: Do not clone a loop with a scalar replaceable allocation Also use 1 instead of 'true' for bitfields assignment. -------------------------------------------------------------------------------------------- http://webrev.invokedynamic.info/kvn/6667615/index.html Fixed 6667615: Escape Analysis: extend MDO to cache arguments escape state Problem: EA uses byte code escape analyzer to determine the escape state of allocations passed as arguments to a not inlined call. bcEscapeAnalyzer looks recursively on the method's bytecode and bytecode of called methods. It is expensive to do this in each call site. Solution: Use MDO to cache arguments escape state. -------------------------------------------------------------------------------------------- http://webrev.invokedynamic.info/kvn/6667620/index.html Fixed 6667620: Escape Analysis: fix deoptimization for scalar replaced objects Problem: Deoptimization code for reallocation and relocking scalar replaced objects has to be fixed.