Collectors inventory

Remi Forax forax at
Mon Mar 4 10:27:12 PST 2013

On 03/04/2013 04:37 PM, Brian Goetz wrote:
>>> As I promised a long time ago, here's an overview of what's in
>>> Collectors currently.
>> I think there are too many methods in Collectors, we should restrain
>> ourselves to 2 forms (3 max).
> Let me make sure I understand the rationale for such a rule.
> Having more forms has a clear advantage: the client code is simpler 
> (e.g., free of extra noise like HashMap::new when the user doesn't 
> care what Map he gets.)

Having to open and read the javadoc each time you want to use a 
Collector or worst each time you read a code that uses a Collector is a 
big disadvantage IMO. The whole Collector API has to fit into a humain 

>   And the implementations are trivial, so the implementation 
> complexity is not an issue.

No, the issue is more to understand the difference between all the 

>   Is the sole issue here the "OMG so many Collectors" reaction when 
> the user goes to the Javadoc page for Collectors?

It's more OMG, I have to read a code that use a Collector ...

>>> There are 12 basic forms:
>>>  - toCollection(ctor)
>>>  - toList()
>>>  - toSet()
>>>  - toStringBuilder()
>>>  - toStringJoiner(delimiter)
>>>  - to{Long,Double}Statistics
>>>  - groupingBy(classifier, mapFactory, downstream collector)
>>>  - groupingReduce(classifier, mapFactory, mapper, reducer)
>>>  - mapping(mappingFn, downstream collector)
>>>  - joiningWith(mappingFunction, mergeFunction, mapFactory)
>>>  - partitioningBy(predicate, downstream collector)
>>>  - partitioningReduce(predicate, mapper, reducer)
> To be clear, has anyone objected to any of these basic forms, or are 
> we only talking about the variants?

I am talking about variants.

>>> GroupingBy has four forms:
>>>  - groupingBy(T->K) -- standard groupBy, values of resulting Map are
>>> Collection<T>
>>>  - Same, but with explicit constructors for map and for rows (so you
>>> can produce, say, a TreeMap<K, TreeSet<T>> and not just a
>>> Map<K,Collection<T>>)
>>>  - groupingBy(T->K, Collector<T,D>) -- multi-level groupBy, where
>>> downstream is another Collector
>>>  - Same, but with explicit ctor for map
>> You can remove the third one give, you have the one with an explicit
>> constructor.
> I think its a false economy to suggest removing this one.  Think about 
> the user code:
>   collect(groupBy(Foo::first, groupBy(Foo::second)))
> is really clear.  The extra map ctor:
>   collect(groupBy(Foo::first, groupBy(Foo::second), HashMap::new))
> really feels like noise when reading the code -- all for the sake of 
> removing a trivial overload?  Also, for some collectors, we may want a 
> specialized Map implementation, one that is, say, optimized for 
> merging.  (Partition, at this point, is basically groupBy with an 
> optimized Map implementation.)  In which case the explicit 
> HashMap::new is a performance impediment.

If you have such Map, you should made it public, people will re-use it.

Now for groupBy of groupBy, it's a corner case, for a corner case, it's 
usually better to be a little more verbose if you end with only one 
form. Again, it's easier to read and easier to write.

> So, while I accept that removing the non-explicit-ctor versions could 
> reduce the number of forms, I think its a false economy -- because the 
> resulting user code is worse.

user code is better because there is less overload (or better one) that 
can match.

maybe later, for jdk9 or jdk10, you can add more collectors if people 
ask, but I think here it's important to be as simple as possible.


More information about the lambda-libs-spec-experts mailing list