RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64

Hiroshi H Horii HORII at jp.ibm.com
Fri Apr 22 12:28:13 UTC 2016

Dear all:

Can I please request reviews for the following change?

Code change:
(I initially created and Martin enhanced so much)

This change follows the discussion started from this mail.

This change provides relaxed compare-and-exchange by introducing
similar semantics of C++ atomic memory operators, enum memory_order.
As described in atomic_linux_ppc.inline.hpp, the current implementation of
cmpxchg is fence_cmpxchg_acquire. This implementation is useful for
general purposes because twice calls of sync before and after cmpxchg will
provide strict consistency. However, they sometimes cause overheads 
sync instructions are very expensive in the current POWER chip design.
In addition, for the other platforms, such as aarch64, this strict 
may cause some overheads (according to the Andrew's mail). 

With this change, callers can explicitly specify constraints of memory 
for cmpxchg with an additional parameter, memory_order order.

typedef enum memory_order {
} memory_order;

Because the default value of the parameter is memory_order_seq_cst, 
existing codes can use the same semantics of cmpxchg without any
modification. The relaxed cmpxchg is implemented only on ppc 
in this changeset. Therefore, the behavior on the other platforms will
not be changed with this changeset.

In addition, with the new parameter of cmpxchg, this change improves
performance of copy_to_survivor in the parallel GC. 
copy_to_survivor changes forward pointers by using cmpxchg. This 
operation doesn't require any sync instructions.  A pointer is changed 
at most once in a GC and when cmpxchg fails, the latest pointer is 
available for the caller. cas_set_mark and cas_forward_to are extended 
with an additional memory_order parameter as cmpxchg and copy_to_survivor
uses memory_order_relaxed to modify the forward pointers.

Summary of source code changes:

* src/share/vm/runtime/atomic.hpp 
     - Defines enum memory_order and adds a parameter to cmpxchg.

* src/share/vm/runtime/atomic.cpp
* src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp
* src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp
* src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp
* src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp
* src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp
* src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp
* src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp
* src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp
* src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp
     - Added a parameter for each cmpxchg function to follow
        the change of atomic.hpp. Their implementations are not changed.

* src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp
* src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
     - Added a parameter for each cmpxchg function to follow
        the change of atomic.hpp. In addition, implementations 
        are changed corresponding to the specified memory_order.

* src/share/vm/oops/oop.hpp
* src/share/vm/oops/oop.inline.hpp
     - Add a memory_order parameter to use relaxed cmpxchg in
        cas_set_mark and cas_forward_to.

* src/share/vm/gc/parallel/psPromotionManager.cpp
* src/share/vm/gc/parallel/psPromotionManager.inline.hpp

Martin tested this changeset  on linuxx86_64, linuxppc64le and 
Though more time is needed to test on the other platform, we would like to 
reviews and start discussion on this changeset.
I also tested this changeset with SPECjbb2013 and confirmed that gc pause 
is reduced.

Hiroshi Horii, Ph.D.
IBM Research - Tokyo

