Collectors update redux
brian.goetz at oracle.com
Thu Feb 7 17:11:44 PST 2013
> I can think of uses for all of it, but I worry about someone faced
> with picking the right static factory method of Collectors. Maybe
> with the right class comment, users can be guided to the right
> combinator without having to know much.
It's worth noting that the only method that is really needed is:
<R> R reduce(Supplier<R> factory,
BiFunction<R, T, R> reducer,
All the other forms of reduce/collect can be written in terms of this
one -- though some are more awkward than others. Similarly, all the
Collectors are just "macros" for specific combinations of inputs to this
form of reduce.
And, as to the Collectors, groupBy can be written in terms of
groupingReduce; partitioning is just grouping with a boolean-valued
function; joiningWith is a form of groupingReduce too. We don't *need*
any of them. They're all just reductions that can be expressed with the
So we *could* boil everything down to just one method. But, of course,
we should not, because the client code gets harder to write, harder to
read, and more error-prone. Each "A can be written in terms of B"
requires an "aha" that is obvious in hindsight but could well be slow in
So it's really a question of "where do we turn the knob to." The forms
of reduce we've got are a (non-orthogonal) set that are (subjectively)
tailored to specific categories of perceived-to-be common situations.
Similarly, the set of Collectors is based on having scoured various "100
cool examples with <my favorite query framework>" to distill out common
use cases. None of the Collectors add any "power" in the sense they can
all be written as raw reduce; but they do add expressiveness. Each one
you take away makes some clearly imaginable use case harder. And each
one you add moves us closer to combinator overload.
For example, suppose we take away mapping(T->U, Collector<U>). The user
wants to compute "average sale by salesman". He sees
groupBy(Txn::seller), but that gives him a Collection<Txn>, not what he
wants. He sees groupBy(Txn::seller, Collector<Txn>), and he sees
toStatistics which will give him the average/min/max he wants, but he
can't bridge the two. So he has to either do it in two passes, or write
his own averaging reducer. Which isn't terribly hard but he'd rather
re-use the one in the library.
Adding in mapping(T->U, Collector<U>) lets him write
and be done -- and still readable -- and obviously correct.
For every single one of these, we could make the argument "we don't need
it because it's ten lines of code the user could write if he needs" (all
the Collectors are tiny); then again for every single one of them, we
could make the argument that it's self-contained and useful for
realistic use cases.
So in the end the "right" set will be highly subjective. Personally, I
think we've got just about the right set of operations, but maybe too
many flavors of each. (Note we already took away the flatMap-like
flavors of groupBy, where each input element can be mapped to multiple
output elements, which already cut the number of combinations in half.)
And maybe we could cut back on the variations (e.g., eliminate the
forms that let you provide your own Map constructor, and you always just
get a HashMap.) Or maybe we have the right forms and flavors, but we
need a more Builder-like API to regularize it. Or maybe slicing them
differently will be less confusing. Or more confusing.
So, constructive input welcome!
More information about the lambda-libs-spec-observers