IntVector.fromValues is not optimized away ?
paul.sandoz at oracle.com
Tue May 12 19:45:31 UTC 2020
> On May 11, 2020, at 6:39 PM, John Rose <john.r.rose at oracle.com> wrote:
> On May 11, 2020, at 5:14 PM, Paul Sandoz <paul.sandoz at oracle.com <mailto:paul.sandoz at oracle.com>> wrote:
>> I wonder if it's possible to teach the shared reduction code about operations using the identity value?
> In general, I’d encourage us to put as much into shared code as
> possible. We have more vector hardware in our future; I’m thinking
> of GPUs of course, and who knows what other CPUs or VPUs will be
> important in 10 years.
> BRW, this reminds me that in some cases reduction operations are
> most naturally formulated as type (scalar, vector) -> scalar, not just
> (vector) -> scalar. The two-argument form reduces to the one-argument
> form when the input scalar is the identity value. The two-argument
> form is useful when several vectors are being rolled up together,
> perhaps in a loop. I think we may want (not now but later) to make
> the building block be the two-argument reduction, not the simpler
> Also BTW, and independently, we might wish to make a shared
> convention (in C2 and the Java code) that reductions are always
> done in some particular order, when it matters. If we do make
> such a choice, we should choose a particular binary spanning tree,
> since that, generally speaking, is how it’s done in hardware.
> Disagreements between spanning tree orders can be removed
> (if needed) by one-time permutations of the input.
> It seems to me that the two observations work against each other,
> since you can’t build such a good spanning tree on 1+2^lgN nodes
> as you can on 2^lgN nodes. This is one reason we need some time
> (after the current release) to consider the proper order specification
> for reductions in our portable API.
> (BTW, the difference in order only matters with floating operations
> that have NaNs and/or rounding errors. So the problems with order
> are limited only to those, and whatever other non-associative
> operations we might define in the future.)
> Two arguments in favor of reducing in N-1 sequential steps instead
> of lgN steps of parallel operations: It’s the simplest to specify, and
> works best with the binary version of reduction. One argument
> against: It will make rounding and NaNs slow FP operations down.
> Maybe there’s a “strictfp” move we can use to allow the JVM more
> latitude for reordering reductions in to lan trees, except in strict code.
> — John
More information about the panama-dev