Request for review (S): 6636138: UseSuperWord enabled failure
Changpeng.Fang at Sun.COM
Wed Mar 18 16:32:25 PDT 2009
I add two small test case to help review (and updated webrev):
This test case shows the problem of superwordization of array copy.
After unrolling, part
of the loop look like:
src[i] = i;
dst[i] = 2;
ref[i] = src[i]; // (1)
src[i+1] = i+1;
dst[i+1] = 2;
ref[i+1] = src[i+1]; // (2)
The loads src[i] and src[i+1] could be combined to use superword.
We know src[i]'s memory input is dst[i] and src[i+1]'s memory input is
The question is what's the memory state input for the corresponding
The original approach used dst[i], which essentially move the superword
load before the store[i+1],
and thus generated the wrong value.
My approach uses dst[i+1] which preserve the memory order.
This test case shows the problem of superwordization of stores. After
of the loop look like:
src[i] = src[i-1];
src[i-1] = invar; // (1)
src[i-1] = src[i-2]; // where to schedule the store src[i-1] and
src[i-2] = invar; // (2)
src[i-1] = invar and src[i-2] = invar can be combined to use superword
store. The question is
where to put the sandwitched src[i-1] = src[i-2] ?
The original approach put all of them (src[i-1] and src[i-2]) before the
superword and thus the src[i-1]
was stored an incorrect value.
My approach is based on dependence analysis. The result is the store
src[i-1] is scheduled after the superword,
and the load src[i-2] is scheduled before the superword.
On 03/12/09 14:56, Changpeng Fang wrote:
> This patch also fixes the problem in CR 6812207: Possible bug in a
> floating point (float) arithmetic in 32-bit server HotSpot
> The Problem:
> SuperWord scheduling ignored the memory dependence, and thus caused
> memory order violation and thus
> programs in CR 6636138 and CR 6812207 generated incorrect results.
> What in This Patch?:
> I have implemented a new superword scheduling approach to adjust the
> memory graph. For a store superword,
> we move all sandwitched memory operations outside the pack based on
> the dependence information.
> For a load superword, we use the latest memory state for the pack.
> JPRT scimark specjvm98 volano25 jetstream and test cases in CR
> 6636138 and CR 6812207, and several small self-developed cases.
More information about the hotspot-compiler-dev