Request for reviews (XL): 7119644: Increase superword's vector size up to 256 bits
vladimir.kozlov at oracle.com
Fri Jun 8 18:45:47 PDT 2012
I found that I can't use memory_type() because some ideal transformations (in
mulnode.cpp) may replace load from char with short load and reverse. So
instead of comparing vector types I added new method same_velt_type(n1, n2)
which compare element size for integer types. Also there is no difference in
vector instructions for Char and Short types so I removed Char related vector
nodes and used Short vectors for Char type.
I also added regression tests for boolean and char types.
Vladimir Kozlov wrote:
> I found an other problem. My new regression test TestShortVect failed in
> the overlapping case (load and store from the same array with offset).
> Superword code thinks store into short array is char store and
> different from short load from short so it allows overlapping. It
> happened because C2 has only one StoreC node for 2 bytes stores but
> different LoadS and LoadUS nodes for loads. And these load nodes has
> different memory type: T_SHORT and T_CHAR.
> The simplest change is to add new StoreS node with memory type T_SHORT.
> I like this approach and will go with it if nobody object.
> An other solution is to extract element type from address type instead
> of memory_type() for memory nodes. But it is duplication of parser code.
> Note, memory_type() is called in other C2 places only to determine
> element size. But in Superword it is used to construct the vectors type
> Vladimir Kozlov wrote:
>> Most changes after latest Tom's review were done in superword.cpp and
>> added regression tests. I think I nailed down latest issues I had with
>> superword code. The changes pass all testing I did. Please, review it
>> (same web link).
>> On 4/3/12 10:03 AM, Vladimir Kozlov wrote:
>>> 7119644: Increase superword's vector size up to 256 bits
>>> Increase superword's vector size up to 256-bits for YMM AVX registers
>>> on x86. Added generation of different vector sizes
>>> for different types of arrays in the same loop. Allow to generate
>>> small (4 bytes) vectors for loops which were unrolled
>>> small number of iterations.
>>> Add new C2 types for vectors and rework VectorNode implementation.
>>> Used MachTypeNode as base node for vector mach nodes
>>> to keep vector type.
>>> Moved XMM registers definition and vector instructions into one file
>>> x86.ad (have to rename eRegI to rRegI in x86_32.ad).
>>> Tested with full CTW, NSK, C2 regression tests, JPRT and added new test.
More information about the hotspot-compiler-dev