accumulate locally and mutatively but combine concurrently (a use case)

Howard Lovatt howard.lovatt at
Thu May 9 23:50:59 PDT 2013

I raised a similar issue a while back and Brian's response was that the use
case wasn't common enough to justify.

Ideally you would want to expand your example further, for a Stream<I> you
would like a method:

<A, C, O> O collect(
  Supplier<A> supplier,
  BiFunction<A, I, A> accumulator,
  Function<A, C> toCombine,
  BinaryOperator<C> combiner,
  Function<C, O> toOutput
) { ... }

With a convenience overloads of

<C, O> O collect(
  Supplier<C> supplier,
  BiFunction<C, I, C> accumulator,
  BinaryOperator<C> combiner,
  Function<C, O> toOutput
) {
  return collect(supplier, accumulator, (c) -> c, combiner, toOutput);

<O> O collect(
  Supplier<O> supplier,
  BiFunction<O, I, O> accumulator,
  BinaryOperator<O> combiner
) {
  return collect(supplier, accumulator, (o) -> o, combiner, (o) -> o);

I collect(
  Supplier<I> supplier,
  BinaryOperator<I> accumulatorAndCombiner
) {
  return collect(supplier, accumulatorAndCombiner, (i) ->
i, accumulatorAndCombiner, (i) -> i);

My guess is that the response will be the same; that the use case is too

I don't have a feel for how rare these cases are, all I know is that I have
wanted them and now you!

Also I guess a very real concern will be extra work for the team.

On 10 May 2013 11:51, John Rose <john.r.rose at> wrote:

> How does a Collector represent the following computation:
> 1. Somebody passes out stream chunks to several processor threads.
> 2a. Each thread accumulates results into a mutable aggregate ArrayList<T>.
> 2b. (Or the aggregate could be something even cleverer that doesn't do
> log(N) copying of its values.)
> 2c. Each such ArrayList<T> is serially modified by thread-confined side
> effects.
> 2d. The JIT can understand the loop that does this.
> 2e. At the smallest scale, the writes into the arrays are performed by
> unrolled loops, even vectorized ones, with minimal per-element overhead.
> 3a. Each thread produces a finished ArrayList<T> aggregate of results for
> roll-up.
> 3b. Each thread "freezes" its result aggregate before handoff.  The
> resulting frozen aggregate is (1) thread-safe and (2) immutable.
> 3c. (For some mutable aggregates, the freezing step could also include
> operations like right-sizing or balancing.)
> 3d. The freezing operation includes a suitable memory fence operation to
> settle all side effects to the aggregate.
> 4. The guy(s) that passed out stream chunks waits for the resulting
> aggregate and combines them, using O(log N) pointer pasting operations.
> It seems to me that step 2 is well represented by a STRICTLY_MUTATIVE but
> not CONCURRENT Collector.
> And the other steps are well represented by a CONCURRENT and possibly
> But running step 2 with a CONCURRENT Collector would be bad, since even if
> each thread had its own accumulator, the accumulator would have to provide
> for safely concurrent access.  The JVM can improve the computation with
> such things using biased locking and lock coarsening, but they will never
> be fast like a normal loop into a local array.
> Do we need a collector mode which says, "I accumulate locally and
> mutatively but I combine concurrently"?
> Even more:  Should the transition from mutative to immutable be
> represented by a type shift?  In Collector<T,M,C>, M is the mutable
> aggregate, and each thread would have an export step that freeze its M into
> a C value for subsequent combination (roll-up) across threads:
>     BiFunction<M, T, M> accumulator();
>     Function<M, C> freezer();
>     BinaryOperator<C> combiner();
> (Note:  "freezer" is not a real proposal!)
> Having the extra distinction of M vs. C will cause friction in other
> places, but perhaps it is an important distinction between larval and adult
> stages[1] of a data structure.  The distinction can be erased, but even
> then there still needs to be (in step 3b above) an explicit operation (a
> Consumer<R>, the final side effect!) which freezes a value of type R.
>     BiFunction<R, T, R> accumulator();
>     Consumer<R> freezer();
>     BinaryOperator<R> combiner();
> Given the choice I would prefer the type distinction.  Even better would
> be for someone to point out the right way to use the current Collector API
> to do steps 1-4 above.
> – John
> [1]

  -- Howard.

More information about the lambda-libs-spec-observers mailing list