Native wrapper optimization

Krystal Mok rednaxelafx at
Sun Nov 20 20:53:49 PST 2011

Hi all,

(Just in case my company email strips attachment again, I'm replying with
my personal email)

I've got the patch ported to 32-bit x86. See attachment.

Additional comments about the patch:
Joseph's original patch moves the IC miss jump out-of-line, but on x64 with
compressed oops, that doesn't really save space in the unverified entry
point code sequence, due to the 8-byte alignment. Examples in [1].

The version in this mail's attachment uses jump_cc() inline instead of
jcc() and a out-of-line jump(). The UEP code generated by both C1 and C2
uses the same pattern.

There's a similar pattern in generate_i2c2i_adapters() that could have used
jump_cc() to call the ic_miss_stub. But the gains doesn't look significant
enough so I didn't modify it.

Another note:
In x64's version of SharedRuntime::generate_dtrace_nmethod(), the IC check
isn't using load_klass().

__ verify_oop(receiver);
__ cmpl(ic_reg, Address(receiver, oopDesc::klass_offset_in_bytes()));
__ jcc(Assembler::equal, hit);

Is this correct, or should it be modified to use load_klass(), too? My take
is the latter.

load_klass() was introduced in [2], and later, generate_dtrace_nmethod()
was introduced in [3]. I think [3] missed the compressed oops changes.

Kris Mok
Software Engineer, Taobao (


2011/11/18 changren <changren at>

> Ok, Kris will help to port to 32bit.
> Thanks,
> Joseph
> ÓÚ 2011-11-18 17:09, Christian Thalinger дµÀ:
> > Looks like a good patch to me.  What about 32-bit x86?
> >
> > -- Chris
> >
> > On Nov 18, 2011, at 7:39 AM, changren wrote:
> >
> >> Hi, all
> >> Attached patch(diff with hsx20) is supposed to speed up native
> >> invocation. It rearranges the compiled-to-native wrapper code to
> >> straighten branches which improves spatial locality. Micro
> >> benchmark(500m consecutive JNI invocations with warm up) shows the
> >> stalled CPU cycles caused by instruction fetch due to L1 ICache miss
> >> decrease 3.4% on Intel Nehalem microarchitecture and 9.6% on Core
> >> microarchitecture. The real execution time of the micro benchmark is
> >> also decreased 5-10% respectively which reflects the improvement.
> >> Thanks,
> >> Joseph
> >>
> >>
> >> ________________________________
> >>
> >> This email (including any attachments) is confidential and may be
> legally privileged. If you received this email in error, please delete it
> immediately and do not copy it or use it for any purpose or disclose its
> contents to any other person. Thank you.
> >>
> >>
> ±¾µçÓÊ(°üÀ¨Èκθ½¼þ)¿ÉÄܺ¬ÓлúÃÜ×ÊÁϲ¢ÊÜ·¨Âɱ£»¤¡£ÈçÄú²»ÊÇÕýÈ·µÄÊÕ¼þÈË£¬ÇëÄúÁ¢¼´É¾³ý±¾Óʼþ¡£Çë²»Òª½«±¾µçÓʽøÐи´ÖƲ¢ÓÃ×÷ÈκÎÆäËûÓÃ;¡¢»ò͸¶±¾ÓʼþÖ®ÄÚÈÝ¡£Ð»Ð»¡£
> >> <JNIWrapperOpt.patch>
-------------- next part --------------
An HTML attachment was scrubbed...
-------------- next part --------------
A non-text attachment was scrubbed...
Name: JNI_wrapper_ver2.patch
Type: application/octet-stream
Size: 7936 bytes
Desc: not available
Url : 

More information about the hotspot-dev mailing list