Post-transform and the standard Collectors

Remi Forax forax at
Wed Jun 12 15:17:15 PDT 2013

On 06/12/2013 10:59 PM, Brian Goetz wrote:
> I've posted a doc snapshot here:
> As to the ? issue: looking at declarations like:
> static <T,K,D,A,M extends java.util.Map<K,D>>
> Collector<T,?,M> groupingBy(...)
> there's enough generics noise there that the additional question mark 
> seems not the worst problem...

I propose you an exercise, let say I want to write a static method that 
take a Collector<T,?,M> as parameter
and returns a new one that will, for each methods of the collector 
prints hello and delegate to the collector taken as parameter.

Most of my students fail to write that code in the proper way (i.e. 
without @SuppressWarnings everywhere).


> On 6/12/2013 3:13 PM, Brian Goetz wrote:
>> A question this raises: it is now possible (wasn't before) for
>> Collectors like minBy to return Optional, like their stream
>> counterparts.  However, it is far less likely that such a Collector will
>> be invoked on an empty stream than Stream.minBy() will.  Here's why:
>> If all you're doing is getting the minima of a stream, you're more
>> likely to do
>>    stream.minBy(c)
>> than
>>    stream.collect(Collectors.minBy(c))
>> The more common cases where Collectors.minBy will be used is in the
>> downstream of a groupingBy:
>>    Map<Person, Txn> largestTxnBySeller =
>>      txns.collect(groupingBy(Txn::seller, 
>> maxBy(comparing(Txn::amount)));
>> Here, we won't create a map key unless there is already one value.
>> So there are arguments both for and against having these collectors
>> collect to Optional.  (If we don't, we should document the value
>> associated with no results, which is almost certainly null for minBy,
>> maxBy, and reducing(op)).
>> On 6/12/2013 1:15 PM, Brian Goetz wrote:
>>> I've done a pass on the standard Collectors to adapt them to the
>>> post-transform.  Significant changes:
>>>   - All factory methods that returned Collector<T,R> now return
>>> Collector<T,?,R>.  (It is good that no factory method leaks its 
>>> internal
>>> type.)  We can continue to discuss mitigation plans on this, if
>>> necessary, in a separate thread.
>>>   - The accumulator function in collector is now back to a BiConsumer
>>> rather than a BiFunction.  This simplified a number of implementations.
>>>   The STRICTLY_MUTATIVE characteristic goes away entirely.
>>>   - toList is now back to strict ArrayList, as Remi requested.
>>>   - toStringBuilder can now hide its StringBuilder, and collect to a
>>> String instead.  So I renamed it "concatenating" (and also extended it
>>> to collect CharSequence instead of String.)
>>>   - toStringJoiner can similarly hide the internal StringJoiner, so was
>>> renamed to "joining(delimiter)".  (Confusion with database joins is
>>> possible, open to a better name.)  Also on the to-do list: Paul
>>> suggested a way to support the full form of StringJoiner (with prefix
>>> and postfix) so I'll add an overload for that.
>>>   - The various reducing collectors can now use a mutable internal box
>>> class, and hide that as an implementation detail, eliminating the
>>> internal boxing in sumBy().
>>>   - It would be nice to overload sumBy(mapper) with int, long, and
>>> double versions, but unfortunately we have crossed the boundary of what
>>> type inference can disambiguate.  We have some choices here:
>>>     - Have a single sumBy(ToLongFunction<T>)
>>>     - Rename to summingXxx, allowing summingInt(ToIntFunction),
>>> summingLong(ToLongFunction), ...
>>>   - I want to add averaging() collectors (and now can), which would 
>>> have
>>> to follow whatever naming choice we select above.
>>>   - Related, we have separately named toXxxSummaryStatistics which
>>> follow the same pattern.  If we go with summingInt/averagingInt, maybe
>>> this becomes summarizingInt?  We also have the opportunity now to make
>>> the resulting statistics immutable on completion -- do we want to do
>>> that?
>>> To put it all in one place, here are the advantages of this additional
>>> feature:
>>>   - It is the first thing that nearly every users asks for when they 
>>> see
>>> Collector; its lack is a significant gap.  We had wanted this from the
>>> beginning, but earlier versions of Collector made it impossible, but
>>> later evolutions made it possible again.
>>>   - It makes possible Collectors like averaging(), which people want 
>>> and
>>> which were previously not practical.
>>>   - It enables Collectors to enforce invariants in the final result 
>>> that
>>> cannot be enforced in the intermediate accumulation, such as tree
>>> balancing, immutability, etc.
>>>   - It enables Collectors like "toStringBuilder" to not leak their
>>> internal state (StringBuilder) into the user code, but instead provide
>>> the result type that the user actually wants (String).
>>>   - It eliminates the complexity of STRICTLY_MUTATIVE.
>>>   - It eliminates the performance overhead of boxing during reduction.
>>> In totality, I see these benefits as a huge step forward.  I realize
>>> there are some rough edges and we can continue to discuss how to file
>>> them down, or whether we wish to live with them.
>>> I'll be checking these into lambda shortly and posting a link to the
>>> docs for more detailed review.
>>> On 5/28/2013 6:23 PM, Brian Goetz wrote:
>>>> Adding the ability to have a post-transform function raises some
>>>> questions about how the standard collectors should change to
>>>> accomodate them.  These fall into two categories: - Should we? -
>>>> How?
>>>> For collectors like toStringBuilder, we can now collect to a String
>>>> and not expose the intermediate StringBuilder type.  This is both
>>>> closer to what the user wants and allows for better implementation
>>>> hiding:
>>>> static Collector<String, ?, String> toStringBuilder() { ... }
>>>> Of course, now the name is wrong.  So it would need a new name.
>>>> (Ditto for toStringJoiner.)
>>>> It also makes sense to have a new combinator that can attach a
>>>> post-transform to an existing Collector (name is just a
>>>> placeholder):
>>>> <T, I, R> Collector<T, I, R> transforming(Function<I, R>,
>>>> Collector<T, ?, I>)
>>>> A harder question is how much to introduce immutability. For
>>>> example, one negative of the current toList() collector is that the
>>>> returned list is sometimes, but not always, immutable.  It would be
>>>> nice to be able to commit to something.  We could easily make it
>>>> immutable with a post-transform of Collections::immutableList.  At
>>>> first, this seems a no-brainer.  But after more thought, it's
>>>> definitely a "should we?"
>>>> Consider how this plays as a downstream collector.  The simplest form
>>>> of groupingBy -- groupingBy(f) -- expands to groupingBy(f, toList()).
>>>> If we made toList always return an immutable List, then we would have
>>>> to apply the post-transform to every value of the resulting map,
>>>> likely via a (sequential) Map.replaceAll on the simplest groupingBy
>>>> operation, even when the user didn't care about immutability.  Making
>>>> every groupingBy user pay for this seems like a lot. (Alternately,
>>>> the default toList() could still return an immutable list, but the
>>>> default groupingBy could use a different downstream collector.)
>>>> One option is to have mutable and immutable versions of every
>>>> Collection/Map-bearing Collector.  But this is a 2x explosion of
>>>> Collectors, after we did so much work to pare back the size of the
>>>> Collector set.   Another is to have combinators for adding
>>>> immutability to Collection, List, Set, and Map.   Then an immutable
>>>> groupingBy would be:
>>>> collect(asImmutableMap(groupingBy(f, asImmutableList(toList()))));
>>>> Wordy, but not terrible, and probably better than imposing the costs
>>>> on everyone?

More information about the lambda-libs-spec-experts mailing list