Post-transform and the standard Collectors

Brian Goetz brian.goetz at
Wed Jun 12 10:15:06 PDT 2013

I've done a pass on the standard Collectors to adapt them to the 
post-transform.  Significant changes:

  - All factory methods that returned Collector<T,R> now return 
Collector<T,?,R>.  (It is good that no factory method leaks its internal 
type.)  We can continue to discuss mitigation plans on this, if 
necessary, in a separate thread.

  - The accumulator function in collector is now back to a BiConsumer 
rather than a BiFunction.  This simplified a number of implementations. 
  The STRICTLY_MUTATIVE characteristic goes away entirely.

  - toList is now back to strict ArrayList, as Remi requested.

  - toStringBuilder can now hide its StringBuilder, and collect to a 
String instead.  So I renamed it "concatenating" (and also extended it 
to collect CharSequence instead of String.)

  - toStringJoiner can similarly hide the internal StringJoiner, so was 
renamed to "joining(delimiter)".  (Confusion with database joins is 
possible, open to a better name.)  Also on the to-do list: Paul 
suggested a way to support the full form of StringJoiner (with prefix 
and postfix) so I'll add an overload for that.

  - The various reducing collectors can now use a mutable internal box 
class, and hide that as an implementation detail, eliminating the 
internal boxing in sumBy().

  - It would be nice to overload sumBy(mapper) with int, long, and 
double versions, but unfortunately we have crossed the boundary of what 
type inference can disambiguate.  We have some choices here:
    - Have a single sumBy(ToLongFunction<T>)
    - Rename to summingXxx, allowing summingInt(ToIntFunction), 
summingLong(ToLongFunction), ...

  - I want to add averaging() collectors (and now can), which would have 
to follow whatever naming choice we select above.

  - Related, we have separately named toXxxSummaryStatistics which 
follow the same pattern.  If we go with summingInt/averagingInt, maybe 
this becomes summarizingInt?  We also have the opportunity now to make 
the resulting statistics immutable on completion -- do we want to do that?

To put it all in one place, here are the advantages of this additional 

  - It is the first thing that nearly every users asks for when they see 
Collector; its lack is a significant gap.  We had wanted this from the 
beginning, but earlier versions of Collector made it impossible, but 
later evolutions made it possible again.
  - It makes possible Collectors like averaging(), which people want and 
which were previously not practical.
  - It enables Collectors to enforce invariants in the final result that 
cannot be enforced in the intermediate accumulation, such as tree 
balancing, immutability, etc.
  - It enables Collectors like "toStringBuilder" to not leak their 
internal state (StringBuilder) into the user code, but instead provide 
the result type that the user actually wants (String).
  - It eliminates the complexity of STRICTLY_MUTATIVE.
  - It eliminates the performance overhead of boxing during reduction.

In totality, I see these benefits as a huge step forward.  I realize 
there are some rough edges and we can continue to discuss how to file 
them down, or whether we wish to live with them.

I'll be checking these into lambda shortly and posting a link to the 
docs for more detailed review.

On 5/28/2013 6:23 PM, Brian Goetz wrote:
> Adding the ability to have a post-transform function raises some
> questions about how the standard collectors should change to
> accomodate them.  These fall into two categories: - Should we? -
> How?
> For collectors like toStringBuilder, we can now collect to a String
> and not expose the intermediate StringBuilder type.  This is both
> closer to what the user wants and allows for better implementation
> hiding:
> static Collector<String, ?, String> toStringBuilder() { ... }
> Of course, now the name is wrong.  So it would need a new name.
> (Ditto for toStringJoiner.)
> It also makes sense to have a new combinator that can attach a
> post-transform to an existing Collector (name is just a
> placeholder):
> <T, I, R> Collector<T, I, R> transforming(Function<I, R>,
> Collector<T, ?, I>)
> A harder question is how much to introduce immutability.  For
> example, one negative of the current toList() collector is that the
> returned list is sometimes, but not always, immutable.  It would be
> nice to be able to commit to something.  We could easily make it
> immutable with a post-transform of Collections::immutableList.  At
> first, this seems a no-brainer.  But after more thought, it's
> definitely a "should we?"
> Consider how this plays as a downstream collector.  The simplest form
> of groupingBy -- groupingBy(f) -- expands to groupingBy(f, toList()).
> If we made toList always return an immutable List, then we would have
> to apply the post-transform to every value of the resulting map,
> likely via a (sequential) Map.replaceAll on the simplest groupingBy
> operation, even when the user didn't care about immutability.  Making
> every groupingBy user pay for this seems like a lot.  (Alternately,
> the default toList() could still return an immutable list, but the
> default groupingBy could use a different downstream collector.)
> One option is to have mutable and immutable versions of every
> Collection/Map-bearing Collector.  But this is a 2x explosion of
> Collectors, after we did so much work to pare back the size of the
> Collector set.   Another is to have combinators for adding
> immutability to Collection, List, Set, and Map.   Then an immutable
> groupingBy would be:
> collect(asImmutableMap(groupingBy(f, asImmutableList(toList()))));
> Wordy, but not terrible, and probably better than imposing the costs
> on everyone?

More information about the lambda-libs-spec-experts mailing list