Parallelism cost function
paul.sandoz at oracle.com
Wed Jan 29 02:27:30 PST 2014
On Jan 29, 2014, at 3:46 AM, Sam Pullara <spullara at gmail.com> wrote:
> I think that we should have a lot more information here about when it is appropriate to use the parallelStream() call unless we are going to make sure that it executes inappropriate workloads sequentially.
Very tricky problem to solve, since we don't know what the per-element cost of the pipeline is. (We have some indications as to the quality of splitting the source.)
> I’d hate to have a generation of Java programmers randomly adding .parallelStream() to all their Streams just because they think it will always be faster.
In our presentations we explicitly talk about this ("Going parallel is easy to do, but not always the right thing to do") and help developers to derive a mental model (and when in doubt always measure! ).
So far we have deliberately avoided getting into details of this in the JavaDoc , it could easily take up a few chapters of a book.
However, in hindsight we could have a section highlighting the areas, such as the source size and splitting characteristics, the cost-per-element, and unbalancing making the computation "lumpy", to at least head off the meme that "parallel always equals faster". I believe we could add such a section to the docs of an 8u release.
 We could provide some helper tooling leveraging jmh.
 "flatMap pushed loads of elements into the stream but i ain't seeing any speed up, why?", "that's because the source, which is a Set, only contains a few elements that happen to be mostly lumped in the same bucket".
More information about the lambda-dev