Primitive specialization and arrays
dl at cs.oswego.edu
Fri Sep 14 05:40:18 PDT 2012
(Finally getting a chance to post about some of the library
issues that have been building up...)
The parallel aspects of Brian's Stream-based APIs are targeted to a
different audience than those aiming to fully parallelize aggregate
operations. Which is fine. My CHM.Parallel APIs address one of these
audiences -- those doing Hadoop/etc-like processing on possibly-"live"
key-value pairs. The other is of course Array-based processing, that
was prototyped long ago as extra166y.ParallelArray. Parallel array
operations extend those that you can/would do under Stream APIs in
part because of indexing -- operations can proceed with higher
parallelism so long as the right args/results are in the right
indices. (They are less parallel than CHM for some operations though,
since CHM fully arbitrates contention per-key/entry rather than
relying on execution control to guarantee exclusion/quiescence.) Also,
there is much more demand for parallel ops on both CHM and arrays to
include in-place updates. This in part because they are often very large,
and so you can't afford to pretend that pure functional programming is
the only path to happy software :-) But also because some operations,
like sorting, are most naturally done in-place anyway.
It would not be hard to re-introduce one or more classes that sit
"aside" the Stream framework in the same way that CHM.Parallel does --
trading off poorer usability for more extensive functionality.
So, no explicit fluency/streaminess etc, but still amenable to
special-case translation to/from it. But there are a bunch of
added issues, including:
1. Unlike CHM, we really do need specialized Double, Long, Int
versions, because operations on primitive elements in arrays are
extremely common. There are not already plain sequential forms of
primitive specializations. If we think they are needed (and for
consistency, also the ref version), there would be either 8 classes,
or 4 classes, each with par/seq views. Possible names:
ParallelArray, ParallelDoubleArray, ParallelLongArray, ParallelIntArray
SequentialArray, SequentialDoubleArray, SequentialLongArray,
Or both, where each has methods par()/seq();
BulkArray, BulkDoubleArray, BulkLongArray, BulkIntArray
Just to pick something, I'll use first choice below.
The proposed Numeric interface will reduce some of the
sprawl with these, but they will still be plenty sprawly
(in part because they require a surprisingly large number
of function type interfaces.)
2. Essentially all of the parallel ops on arrays not covered
already via Stream-ized ArrayList etc are those that not
only don't focus on the List/Collection-like aspects, they
preclude them -- in particular no size changes. So it
might make sense to (1) treat an ArrayList as one
kind of "builder" for ParallelArray, adding method
ArrayList.toParallelArray() or static ParallelArray.from(ArrayList)
that does a handoff causing the ArrayList to act empty after
handoff. (2) Either support ParallelArray.asImmutableList() to
enable basic read-only collections stuff on them, or just
implement (the non-interface :-) ImmutableList directly.
(3) For the specialized primitive versions, some other TBD
"build phase" support might be warranted. Or just have a
plain hand-off constructor for arrays created in any way
people want to do.
3. Concurrency considerations force CHM to use "null means nothing
there" conventions (which turn out to simplify many other
issues/constructions). This also applies to arrays of references
(as in a[i] == null), but not arrays of primitives or mixed-mode
operations. All of the ways of dealing with this are objectionable
to some part of target audience, but we'll need to settle on one.
4. Parallel operations on sub-arrays are also common, but as slices
(origin <= index < fence), not as sublists (0 <= index < size(),
offset from parent). This can either be done by supporting
overloaded versions of forEach, map, reduce, replace etc methods
taking origin,fence, or introducing a Slice interface. Neither way
is always better; we'll need to pick one.
5. If you have array-based classes supporting bulk ops, is there
any reason to support any other collection-like forms (List, Set?)
Or any Stream-like forms? Not clear.
My own indecision about some of these issues has
kept me from doing all this out (which is not all that hard --
I've already done it once with ParallelArray). And further
kept me from even proposing this path at all for JDK (as opposed
to some non-JDK add-on package). Any thoughts would be welcome.
More information about the lambda-libs-spec-observers