RFR(S): 8144212: JDK 9 b93 breaks Apache Lucene due to compact strings
tobias.hartmann at oracle.com
Thu Jan 7 14:52:27 UTC 2016
On 07.01.2016 00:58, Vladimir Kozlov wrote:
> Andrew is right.
Yes, he's right that the membar is not needed in this case. I noticed that GraphKit::inflate_string() sets the output memory to TypeAryPtr::BYTES although inflate writes to a char array in this case. This caused the subsequent char load to be on a different slice allowing C2 to move the load to before the intrinsic.
I fixed this for the inflate and compress intrinsics.
> GraphKit::inflate_string() should have SCMemProjNode as compress_string() does to prevent loads move up.
> StrInflatedCopyNode is not memory node.
Okay, why are above changes not sufficient to prevent the load from moving up? Also, the comment for SCMemProjNode says:
// This class defines a projection of the memory state of a store conditional node.
// These nodes return a value, but also update memory.
But inflate does not return any value.
Here is the new webrev, including the SCMemProjNode and adapting escape analysis and macro expansion accordingly:
In library_call.cpp, I now use TypeAryPtr::get_array_body_type(dst_elem) to get the correct TypeAryPtr for the destination (we support both BYTES and CHARS). For a char destination, it returns:
which is equal to the type of the char load.
I also tried to derive the type from the array by using dst_type->isa_aryptr(). However, this returns a more specific type:
Using this results in C2 assuming that the subsequent char load is independent and again moving it to before the intrinsic. I don't understand why that is. Shouldn't the second type be a "subtype" of the first type?
> On 1/6/16 5:34 AM, Andrew Haley wrote:
>> On 01/06/2016 01:06 PM, Tobias Hartmann wrote:
>>> The problem here is that C2 reorders memory instructions and moves
>>> an array load before an array store. The MemBarCPUOrder is now used
>>> (compiler internally) to prevent this. We do the same for normal
>>> array copys in PhaseMacroExpand::expand_arraycopy_node(). No actual
>>> code is emitted. See also the comment in memnode.hpp:
>>> // Ordering within the same CPU. Used to order unsafe memory references
>>> // inside the compiler when we lack alias info. Not needed "outside" the
>>> // compiler because the CPU does all the ordering for us.
>>> "CPU does all the ordering for us" means that even with a relaxed
>>> memory ordering, loads are never moved before dependent stores.
>>> Or did I misunderstand your question?
>> No, I don't think so. I was just checking: I am very aware that
>> HotSpot has presented those of use with relaxed memory order machines
>> with some interesting gotchas over the years, that's all. I'm a bit
>> surprised that C2 needs this barrier, given that there is a
>> read-after-write dependency, but never mind.
More information about the hotspot-compiler-dev