RFR: 8224974: Implement JEP 352

Aleksey Shipilev shade at redhat.com
Thu Aug 1 11:16:05 UTC 2019

On 7/31/19 12:55 PM, Andrew Dinn wrote:
>> So if pre wbsync is no-op, why do we need to handle it everywhere? We seem to be falling through all
>> the way to the stub to do nothing there, maybe we should instead cut much earlier, e.g. when
>> inlining Unsafe.writeBackPresync0? Would it be better to not emit CacheWBPreSync at all?
> The pre sync is definitely not needed at present. However, I put it
> there because I didn't know for sure if some future port of this
> capability (e.g. to ppc) might need to sync prior writes before writing
> back cache lines. [Indeed, the old Intel documentation stated that
> pre-sync was needed on x86 for clflush to be safe but it is definitely
> not needed.]

I am more concerned that the writeback call enters the pre sync stub unnecessarily.

I had the idea to do this more efficiently, and simplify code at the same time: how about emitting
CacheWBPreSync nodes, but emitting nothing for them in .ad matchers? That would leave generic code
generic, and architectures would then be able to avoid the stub altogether for pre sync code. This
would simplify current stub generators too, I think: you don't need to pass arguments to them.

This leaves calling via Unsafe. I believe pulling up the isPre choice to the stub generation time
would be beneficial. That is, generate *two* stubs: StubRoutines::data_cache_writeback_pre_sync()
and StubRoutines::data_cache_writeback_post_sync(). If arch does not need the pre_sync, generate nop
version of pre_sync().

This is not a strong requirement from my side. I do believe it would make code a bit more

>> === src/hotspot/cpu/x86/assembler_x86.cpp
>> It feels like these comments are redundant, especially L8630 and L8646 which mention magic values
>> "6" and "7", not present in the code:


> 8624   // 0x66 is instruction prefix
> 8627   // 0x0f 0xAE is opcode family
> 8630   // rdi == 7 is extended opcode byte
> . . .
> Given that the code is simply stuffing numbers (whether supplied as
> literals or as symbolic constants) into a byte stream I think these
> comments are a help when it comes to cross-checking each specific
> assembly against the corresponding numbers declared in the Intel
> manuals. So, I don't really want to remove them. Would you prefer me to
> reverse the wording as above?

I was merely commenting on the style: the rest of the file does not have comments like that. The
positions of prefixes, opcode families, etc is kinda implied by the code shape.

>> === src/hotspot/cpu/x86/macroAssembler_x86.cpp
>   // prefer clwb (potentially parallel writeback without evict)
>   // otherwise prefer clflushopt (potentially parallel writeback
>   // with evict)
>   // otherwise fallback on clflush (serial writeback with evict)
> In the second case the comment is redundant because the need for an
> sfence is covered by the existing comment inside the if:
>     // need an sfence for post flush when using clflushopt or clwb
>     // otherwise no no need for any synchroniaztion

Yes, this would be good to add.

>> === src/hotspot/cpu/x86/stubGenerator_x86_64.cpp
>> Is it really "cmpl" here, not "cmpb"? I think aarch64 code tests for byte.
>> 2942     __ cmpl(is_pre, 0);
> This is a Java boolean input. I believe that means the value will be
> loaded into c_arg0 as an int so this test ought to be adequate.


>> === src/hotspot/share/opto/c2compiler.cpp
>> Why inject new cases here, instead of at the end of switch? Saves sudden "break":
>>  578     break;
>>  579   case vmIntrinsics::_writeback0:
>>  580     if (!Matcher::match_rule_supported(Op_CacheWB)) return false;
>>  581     break;
>>  582   case vmIntrinsics::_writebackPreSync0:
>>  583     if (!Matcher::match_rule_supported(Op_CacheWBPreSync)) return false;
>>  584     break;
>>  585   case vmIntrinsics::_writebackPostSync0:
>>  586     if (!Matcher::match_rule_supported(Op_CacheWBPostSync)) return false;
>>  587     break;
> I placed them here so they were close to the other Unsafe intrinsics. In
> particular they precede _allocateInstance, an ordering which is also the
> case in the declarations in vmSymbols.hpp.
> In what sense do you mean that an extra 'break' is saved? That would be
> true as regards the textual layout. It wouldn't affect the logic of
> folding different ranges of values into branching range tests (which is
> only determined by the numeric values of the intrinsics). If you are
> concerned about the former then I would argue that placing the values in
> declaration order seems to me to be the more important concern.

I don't think we have to follow whatever ordering mess in vmSymbols.hpp. New code cuts into the last
case block in that switch, which is mostly about "we know about these symbols, they are
falling-through to the break". Adding cases with Matcher::match_rule_supported seems odd there. If
anything, those new cases should be moved upwards to other cases, e.g. after vmIntrinsics::_minD.

>> === src/hotspot/share/prims/unsafe.cpp
>> Do we really need this function pointer mess?
>>  457   void (*wb)(void *);
>>  458   void *a = addr_from_java(line);
>>  459   wb = (void (*)(void *)) StubRoutines::data_cache_writeback();
>>  460   assert(wb != NULL, "generate writeback stub!");
>>  461   (*wb)(a);
>> Seems easier to:
>>  assert(StubRoutines::data_cache_writeback() != NULL, "sanity");
>>  StubRoutines::data_cache_writeback()(addr_from_java(line));
> Hmm, "that whole brevity thing" again? Well, I guess you must now call
> me "El Duderino".

Well, that is, like, your opinion, man. C++ is messy if we allow it to be!

> Are you sure that all the compilers used to build openJDK will happily
> eat the second line of your replacement? If you can guarantee that I'll
> happily remove the type declarations.

I think they do: there are uses like that in the same file already, for example:

  if (StubRoutines::unsafe_arraycopy() != NULL) {
    StubRoutines::UnsafeArrayCopy_stub()(src, dst, sz);
  } else {
    Copy::conjoint_memory_atomic(src, dst, sz);


More information about the core-libs-dev mailing list