Questions about Stream/Iterable/Files - and possibly the compiler

Tagir F. Valeev amaembo at
Fri Nov 6 17:17:43 UTC 2015


>>> <>
>> Thanks to Remi and Paul for the complete explanation. Concerning JDK-8141608, I lile Peter Levart's comment about making a specific Collector.

PS> There is a problem with that approach. At the moment the
PS> Collector does not get to control whether the stream is executed in parallel or sequentially.

It would actually be nice to have a special characteristic for such
case like Collector.Characteristics.SEQUENTIAL. This would signal that
combiner should never be used (it may throw
UnsupportedOperationException). The implementation for such case would
be like this (ReferencePipeline::collect):

public final <R, A> R collect(Collector<? super P_OUT, A, R> collector) {
  A container;
  if(isParallel() &&
     collector.characteristics().contains(Characteristics.SEQUENTIAL)) {
       container = collector.supplier().get();
       BiConsumer<A, ? super P_OUT> accumulator = collector.accumulator();
       forEachOrdered(u -> accumulator.accept(container, u));
  } else ... // existing code follows

Special static methods could be added like
Collector.ofSequential(supplier, accumulator) and
Collector.ofSequential(supplier, accumulator, finisher). Also existing
Collectors::groupingBy/groupingByConcurrent/partitioningBy should be
updated to support this characteristic of downstream collector.

This is somewhat similar to the proposed foldLeft feature
(JDK-8133680). Quite often people write Collectors which don't support
parallel collection: either their combiners throw some exception or
(even worse) silently produce something incorrect (like (a, b) -> a).
See, for example:

Library provides special "convenient" static method to create such
combiner. I don't like this library at all, but people really use it.
Also such solutions posted on StackOverflow sometimes:
Shame on me, I also did this:

So having special characteristic such parallel-hostile combiners would
at least work correctly for parallel stream (and user may still get
some speedup if there are some heavy upstream operations).

Well I doubt that JDK guys would like this proposal, but the fact is
that real world developers rarely care about parallel processing and
just want Streams to work in sequential mode. As a result, some ugly
code is produced like bogus combiner parameter to reduce/collect
methods. Probably API should be more friendly to real user needs...

With best regards,
Tagir Valeev.

More information about the core-libs-dev mailing list