RFR-8148250: Stream.limit parallel ordered performance
Tagir F. Valeev
amaembo at gmail.com
Fri Jan 29 13:47:30 UTC 2016
With best regards,
>> On 26 Jan 2016, at 16:51, Tagir F. Valeev <amaembo at gmail.com> wrote:
>> Thank you for review! Here's the issue:
>> Will post complete webrev later.
>> PS> Note that it is still the case that in almost all scenarios this
>> PS> is likely to be a bad form of parallel stream.
>> Well not always it's possible to estimate in advance the size of the
>> stream. Consider that we have user-specified filter upstream which
>> searches over the big collection (we want to return first 10 search
>> results, order is important):
>> .filter(element -> userQuery.test(element))
>> If user query produces 10-15 results, using parallel stream is very
>> reasonable, but if it produces millions of results it should not
>> regress very much (at least should not become much slower than
>> sequential version which is what we see currently).
PS> I have my doubts that the cost of splitting and filtering a small
PS> number of elements concurrently will pay off in the majority of
PS> scenarios, hence the “almost all”.
PS> It could work in cases where there is lots of data to be filtered
PS> and only a few items are reported that are proportionally spread
PS> out, or over small data and the filter operation is costly.
PS> In any case it’s good to avoid the OOME, i am very glad you found a simple way to resolve that.
>> PS> I think the comment you refer to still applies but now for larger n, so we should refine it.
>> Should we replace "regardless of the value of n" with "when
>> n*parallelismLevel is sufficiently large”?
PS> Yes, when N * P is sufficiently large e.g to pluck a number out of the air > 2^32, say
More information about the core-libs-dev