Performance model of sorted() ?
paul.sandoz at oracle.com
Mon Nov 4 01:13:08 PST 2013
On Nov 1, 2013, at 7:18 PM, Brian Goetz <brian.goetz at Oracle.COM> wrote:
> The "slicing" part is right, but the single-threaded assumption is wrong.
> What happens is that the upstream stateless ops will be done in
> parallel, with the results being collected at the leaves of the
> computation tree, into a bunch of mini-buffers. Now, each will be
> sorted (in parallel), and then we begin a parallel merge, collecting the
> results into an array. Then the downstream stateless ops and the
> terminal op are run, in parallel.
That is what we wanted/planned to do :-) but for reasons of expediency we currently resort to collecting, in parallel, to an array and then using Arrays.parallelSort e.g. the equivalent of:
Stream s = ...
s = s.parallel();
Object a = s.toArray();
s = Arrays.stream(a);
So there is some redundancy and Arrays.parallelSort will not go parallel if the size of the array is less than 2^13.
Unfortunately it will be tricky to share any code from Arrays.parallelSort, thus requiring duplication x 4 (for refs + primitives). (Arrays.parallelArray splits into quarters to ensure the final sort is in the main array but in our case i don't think we require that.)
For an optimal implementation that computation tree needs to consist of nodes that are F/J tasks (such that collection and merging can happen concurrently) and we need to support sized and non-sized pipelines slightly differently for the collection phase. Given that we can use Arrays.parallelArray code as a template i think most of the hard work is done.
> So, no truly sequential steps imposed by the framework (though binary
> parallel merges are generally dominated by the last merge step.)
More information about the lambda-dev