Speed optimization of Spliterators.spliteratorUnknownSize for parallel case
paul.sandoz at oracle.com
Wed Jul 15 12:20:45 UTC 2015
On Jul 14, 2015, at 4:43 AM, Tagir F. Valeev <amaembo at gmail.com> wrote:
> Thank you for the detailed answer.
> PS> Thanks for looking at this. Judging by the results i am guessing
> PS> your measurements were performed on a 4-core system.
> Yes, quad-core, I mentioned it before.
> PS> My initial inclination for these scenarios is it is likely better
> PS> for the developer to split the execution in two parts. The first
> PS> part, low N and low Q, is to collect sequentially into an array or
> PS> list. The second part (total size is known), low N and high Q,
> PS> would then use that array/list as the stream source and operate in parallel.
> Probably it would be fine to add such recommendations to the
> corresponding methods javaDoc (for example, BufferedReader.lines).
I think it is more suitable for:
> PS> That, in a sense, is almost what your spliterator does for n <
> PS> initial batch size -:) but because the target threshold size is
> PS> derived from the unknown root size (Long.MAX_VALUE) the
> PS> sub-spliterators are not themselves further split, so the
> PS> parallelism is better but still limited. It's tricky to adjust the
> PS> target size dynamically, we could do it for the first split (e.g.
> PS> unknown size -> two known size), but tracking further splits is
> PS> harder and may not reliably perform (due to concurrent
> PS> sub-splitting). I generally don't want to complicate the
> PS> computation mechanism for poor splitting use-cases.
> I see. That actually was also the thing which I wanted to suggest.
> Still even improving the first split would be great, I'd vote for it.
I am still hesitating. It's not general enough (optimize for poor splitting under fragile scenarios for N elements under batch size) (FWIW the general computation already alternates left/right forking to avoid such spliterators causing out of memory issues where the computation runs away creating tasks).
I also pondered exposing the batch size as an argument, but i don't think developers will generally know what to do with it, where as i suspect a size estimate is much easier to understand (even if guarded language in terms of proportions e.g last least N).
And of course this does not stop us internally using an alternative spliterator implementation for BufferedReader.lines such as suggested by Marko, which i think could be improved if a size estimate is given, as this could also influence the batch size and common difference.
> This way AbstractSpliterator user implementations may also benefit.
> My optimization cannot be applied to AbstractSpliterator as we cannot
> control the tryAdvance method.
> PS> Your approach also degenerates when N is just over the current batch size.
> Yes, it becomes bad, but my measurement show that it's still no worse
> than current implementation:
Yes, no worse.
> Benchmark (parallel) (size) Mode Cnt Score Error Units
> IteratorTest.base false 1028 avgt 30 659.308 ± 12.618 us/op
> IteratorTest.base false 1100 avgt 30 746.619 ± 21.260 us/op
> IteratorTest.base true 1028 avgt 30 194.800 ± 1.988 us/op
> IteratorTest.base true 1100 avgt 30 228.926 ± 2.923 us/op
> IteratorTest.jdkSpliterator false 1028 avgt 30 648.622 ± 3.857 us/op
> IteratorTest.jdkSpliterator false 1100 avgt 30 750.741 ± 10.279 us/op
> IteratorTest.jdkSpliterator true 1028 avgt 30 673.574 ± 6.469 us/op
> IteratorTest.jdkSpliterator true 1100 avgt 30 679.499 ± 4.310 us/op
> IteratorTest.optimized false 1028 avgt 30 643.209 ± 1.686 us/op
> IteratorTest.optimized false 1100 avgt 30 718.077 ± 8.123 us/op
> IteratorTest.optimized true 1028 avgt 30 674.447 ± 5.153 us/op
> IteratorTest.optimized true 1100 avgt 30 674.622 ± 4.252 us/op
> PS> It does not make as much sense to include such an estimate
> PS> argument for Files.lines for two reasons:
> PS> 1) I recently improved Files.lines for common character sets
> PS> where it is efficient to identify encoded line feeds (such as UTF-8); and
> PS> 2) for other character sets we could derive an estimate from the
> PS> file size (it could actually be the file size, which is the case for 1).
> I see. FileChannelLinesSpliterator is a nice improvement. Have you
> tested it on device files, named pipes or cases when the file size
> is changed in-between by some other process?
No. There is a general clause saying "all bets are off" if the file contents are modified during stream execution. The size is currently snapshot on the Files.lines call. If the file size increases in the interim i suspect it will still work, if it reduces then i would expect an I/O exception.
> Also probably it should
> be mentioned in BufferedReader.lines documentation, that using
> Files.lines for file source is preferred.
Good point a "@see Files.lines ..." would be appropriate.
> PS> So i am inclined to do the following:
> PS> 1) consider adding a size estimate argument for all stream
> PS> factories that are of unknown size and where no reasonable estimate can be derived.
> PS> 2) modify the spliterator factories in Spliterators to accept an estimate e.g.:
> PS> public static <T> Spliterator<T>
> PS> spliteratorUnknownSize(Iterator<? extends T> iterator,
> PS> long sizeEstimate,
> PS> int characteristics)
> By the way currently it's possible to create an IteratorSpliterator
> with estimated size:
> Spliterators.spliterator(iterator, estimatedSize, Spliterator.CONCURRENT);
> Of course it's a misuse of CONCURRENT flag, but it's not used
> otherwisely by stream API, thus currently it has no unwanted
Very sneaky :-) but not something we cannot encourage.
More information about the core-libs-dev