RFR: JDK-8213473: Prevent transformation of LoadB->AndI->CmpI pattern to facilitate testb instruction matching
vladimir.kozlov at oracle.com
Thu Nov 8 18:31:50 UTC 2018
I was actually thinking about testing on platforms which do not execute such instructions folding.
Like Aarch64 or SPARC, to see if your changes (not converting to LoadUB) affects them.
On 11/8/18 4:52 AM, Roman Kennke wrote:
> Hi Vladimir,
> I haven't (yet) come up with an actual micro-benchmark, and haven't yet
> run specjvm or such, but I've written a program that triggers the
> matching and shows the assembly.
> Executed like this (usual jmh):
> java -jar target/benchmarks.jar -prof perfasm
> Yields the following assembly (without patch):
> 0x00007f5ff0587edc: movzbl 0xc(%rsi),%r10d
> // unrelated
> 0,06% 0x00007f5ff0587ee1: mov $0x2,%r8d
> // unrelated
> 0x00007f5ff0587ee7: mov $0x1,%eax
> 0x00007f5ff0587eec: test $0x2a,%r10d
> and with patch:
> 0,08% 0x00007f27605888e7: testb $0x2a,0xc(%rsi)
> I don't know if there's any chance that this performs worse? Maybe the
> two instructions schedule better because CPU can do other work while
> waiting for memory? It surely saves a register, which might come useful
> under register pressure. And it compiles to more compact code.
> Do you want me to take it further and come up with an actual working
> microbenchmark? Or do you have benchmarks that you want to run?
>> Roman, your change seems fine.
>> I thought for SPARC it may regress but the only complex rule with
>> AndI+LoadUB I found is for ConvI2L:
>> match(Set dst (ConvI2L (AndI (LoadUB mem) mask)));
>> I did not find anything on other platforms.
>> Anyway it would be nice to run some performance testing.
>> On 11/7/18 3:46 AM, Roman Kennke wrote:
>>> In JDK-8203628, Aleksey added matchers to generate testb instruction (on
>>> x86) for LoadB->AndI->CmpI sequence. However, in AndINode::Ideal() we
>>> transform any LoadB->AndI to LoadUB->AndI which fails to match this
>>> pattern. Which means we never actually match for testb. The original
>>> matchers for LoadUB->AndI->CmpI needed to be removed because of
>>> The proposed enhancement would prevent the transformation if we have a
>>> straight LoadB->AndI->CmpI sequence. While the matcher is x86-only, I
>>> don't think it actually hurts to prevent this particular sequence. It
>>> should generate better code (e.g. single testb instructions) for things
>>> byte field;
>>> if ((field & 3) == 0) ..
>>> or even:
>>> if (field == x) .. because we always mask byte loads
>>> or even (not sure) same for boolean fields.
>>> Testing: passes hotspot/jtreg:tier1 here
>>> Thoughts? Reviews?
More information about the hotspot-compiler-dev