Native wrapper optimization
christian.thalinger at oracle.com
Mon Nov 21 01:41:21 PST 2011
On Nov 21, 2011, at 5:53 AM, Krystal Mok wrote:
> Hi all,
> (Just in case my company email strips attachment again, I'm replying with my personal email)
> I've got the patch ported to 32-bit x86. See attachment.
> Additional comments about the patch:
> Joseph's original patch moves the IC miss jump out-of-line, but on x64 with compressed oops, that doesn't really save space in the unverified entry point code sequence, due to the 8-byte alignment. Examples in .
> The version in this mail's attachment uses jump_cc() inline instead of jcc() and a out-of-line jump(). The UEP code generated by both C1 and C2 uses the same pattern.
> There's a similar pattern in generate_i2c2i_adapters() that could have used jump_cc() to call the ic_miss_stub. But the gains doesn't look significant enough so I didn't modify it.
Are there any performance regressions with this patch on older x86 architectures?
> Another note:
> In x64's version of SharedRuntime::generate_dtrace_nmethod(), the IC check isn't using load_klass().
> __ verify_oop(receiver);
> __ cmpl(ic_reg, Address(receiver, oopDesc::klass_offset_in_bytes()));
> __ jcc(Assembler::equal, hit);
> Is this correct, or should it be modified to use load_klass(), too? My take is the latter.
It seems the 32-bit version was copied verbatim to the 64-bit one and looks like a bug to me.
> load_klass() was introduced in , and later, generate_dtrace_nmethod() was introduced in . I think  missed the compressed oops changes.
> Kris Mok
> Software Engineer, Taobao (http://www.taobao.com)
> : https://gist.github.com/1380416#file_notes.md
> : http://hg.openjdk.java.net/hsx/hotspot-main/hotspot/file/ba764ed4b6f2/src/cpu/x86/vm/sharedRuntime_x86_64.cpp
> : http://hg.openjdk.java.net/hsx/hotspot-main/hotspot/file/018d5b58dd4f/src/cpu/x86/vm/sharedRuntime_x86_64.cpp
> 2011/11/18 changren <changren at taobao.com>
> Ok, Kris will help to port to 32bit.
> ÓÚ 2011-11-18 17:09, Christian Thalinger Ð´µÀ:
> > Looks like a good patch to me. What about 32-bit x86?
> > -- Chris
> > On Nov 18, 2011, at 7:39 AM, changren wrote:
> >> Hi, all
> >> Attached patch(diff with hsx20) is supposed to speed up native
> >> invocation. It rearranges the compiled-to-native wrapper code to
> >> straighten branches which improves spatial locality. Micro
> >> benchmark(500m consecutive JNI invocations with warm up) shows the
> >> stalled CPU cycles caused by instruction fetch due to L1 ICache miss
> >> decrease 3.4% on Intel Nehalem microarchitecture and 9.6% on Core
> >> microarchitecture. The real execution time of the micro benchmark is
> >> also decreased 5-10% respectively which reflects the improvement.
> >> Thanks,
> >> Joseph
> >> ________________________________
> >> This email (including any attachments) is confidential and may be legally privileged. If you received this email in error, please delete it immediately and do not copy it or use it for any purpose or disclose its contents to any other person. Thank you.
> >> ±¾µçÓÊ(°üÀ¨ÈÎºÎ¸½¼þ)¿ÉÄÜº¬ÓÐ»úÃÜ×ÊÁÏ²¢ÊÜ·¨ÂÉ±£»¤¡£ÈçÄú²»ÊÇÕýÈ·µÄÊÕ¼þÈË£¬ÇëÄúÁ¢¼´É¾³ý±¾ÓÊ¼þ¡£Çë²»Òª½«±¾µçÓÊ½øÐÐ¸´ÖÆ²¢ÓÃ×÷ÈÎºÎÆäËûÓÃÍ¾¡¢»òÍ¸Â¶±¾ÓÊ¼þÖ®ÄÚÈÝ¡£Ð»Ð»¡£
> >> <JNIWrapperOpt.patch>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the hotspot-dev