RFR (S) 8131682: C1 should use multibyte nops everywhere
aleksey.shipilev at oracle.com
Mon Jul 20 14:51:12 UTC 2015
Thanks for taking a look!
Silly me, I should have left the call patching cases intact, because
you're right, we should be able to patch the nops partially while still
producing the correct instruction stream. Therefore, I reverted the
cases where we do nop-ing for *instruction* patching, and added the
Other places seem to use the nop sequences to provide the alignment, not
for the general patching. Especially interesting for us is the case of
aligning the patcheable immediate in the existing call. C2 does the nops
in these cases.
* JPRT -testset hotspot on open platforms;
* Targeted benchmarks, plus eyeballing the assembly;
On 18.07.2015 10:51, Dean Long wrote:
> I think we should distinguish the different uses and treat them
> 1) padding nops for patching, executed
> We need to be careful about inserting a fat nop here, if later patching
> overwrites only part of the fat nop, resulting in an illegal intruction.
> 2) padding nops for patching, never executed
> It should be safe insert a fat nop here, but there's no point if the
> nops are not reachable and never executed.
> 3) alignment nops, never patched, executed
> Fat nops are fine, but on some CPUs branching may be even better, so I
> suggest using align() for this, and letting align() decide what to
> generate. The change in check_icache() could use a version of align
> that takes the target offset as an argument:
> 348 align(CodeEntryAlignment,__ offset() + ic_cmp_size);
> 4) alignment nops, never patched, never executed
> Doesn't matter what we emit here, but we might as well make it
> understandable by humans using a debugger.
> I believe the patching nops in c1_CodeStubs_x86.cpp and
> c1_LIRAssembler.cpp are patched concurrently while the code is running,
> not at a safepoint, so it's not clear to me if it's safe to use fat nops
> on x86. I would consider those changes unsafe on x86 without further
> analysis of what happens during patching.
> On 7/17/2015 6:29 AM, Aleksey Shipilev wrote:
>> Hi there,
>> C1 is not very good at inlining and intrisifying methods, and hence the
>> call performance is important there. One nit that we can see in the
>> generated code on x86 is that C1 uses the single-byte nops, even for
>> long nop strides.
>> This improvement fixes that:
>> - JPRT -testset hotspot on open platforms
>> - eyeballing the generated assembly with -XX:TieredStopAtLevel=1
>> (I understand the symmetric change is going to be needed in closed
>> parts, but let's polish the open part first).
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 819 bytes
Desc: OpenPGP digital signature
More information about the hotspot-compiler-dev