RFR-8148250: Stream.limit parallel ordered performance

Tagir F. Valeev amaembo at gmail.com
Fri Jan 29 13:47:30 UTC 2016


Here's webrev:

With best regards,
Tagir Valeev.

>> On 26 Jan 2016, at 16:51, Tagir F. Valeev <amaembo at gmail.com> wrote:
>> Hello!
>> Thank you for review! Here's the issue:
>> https://bugs.openjdk.java.net/browse/JDK-8148250
>> Will post complete webrev later.
>> PS> Note that it is still the case that in almost all scenarios this
>> PS> is likely to be a bad form of parallel stream.
>> Well not always it's possible to estimate in advance the size of the
>> stream. Consider that we have user-specified filter upstream which
>> searches over the big collection (we want to return first 10 search
>> results, order is important):
>> data.parallelStream()
>>    .filter(element -> userQuery.test(element))
>>    .limit(10).collect(toList());
>> If user query produces 10-15 results, using parallel stream is very
>> reasonable, but if it produces millions of results it should not
>> regress very much (at least should not become much slower than
>> sequential version which is what we see currently).

PS> I have my doubts that the cost of splitting and filtering a small
PS> number of elements concurrently will pay off in the majority of
PS> scenarios, hence the “almost all”.

PS> It could work in cases where there is lots of data to be filtered
PS> and only a few items are reported that are proportionally spread
PS> out, or over small data and the filter operation is costly.

PS> In any case it’s good to avoid the OOME, i am very glad you found a simple way to resolve that.

>> PS> I think the comment you refer to still applies but now for larger n, so we should refine it.
>> Should we replace "regardless of the value of n" with "when
>> n*parallelismLevel is sufficiently large”?

PS> Yes, when N * P is sufficiently large e.g to pluck a number out of the air > 2^32, say

PS> Paul.

More information about the core-libs-dev mailing list