Stream operations -- current set

Doug Lea dl at
Sat Sep 15 07:24:43 PDT 2012

On 09/14/12 16:56, Brian Goetz wrote:
> Here's the current set of stream operations.
> ...
>      <R> Stream<R> flatMap(FlatMapper<? super T, R> mapper);

> Of these, the only one where there is some controversy is over the signature of
> flatMap, where the mapper takes a lambda into which the results are pushed.
> Some people prefer something like
>     flatMap(t -> Collection<T>)
> or
>     flatMap(t -> T[])

To further rub in how central the "little" issues of optional/null,
(as well as numerics) are in all this, note that flatMap is just a
special form of mapReduce(x->coll, addAll), which can be
implemented so as to require a basis/default policy
only if there is nothing there, so could do one of:
(1) return null (2) accept an empty-collection generator as
basis/defaultValue arg (3) return Optional (4) factor into
a special flatMapper interface that absorbs the problem
(as Brian chose; in CHM, I support unified forms of map+Reduce
explicitly, which leverages intrinsic null policy to naturally use
option #1, so method flatMap does not even appear.

> Intermediate / Lazy (Stateful)
> ------------------------------
>      Stream<T> uniqueElements();
>      Stream<T> sorted(Comparator<? super T> comparator);
>      Stream<T> cumulate(BinaryOperator<T> operator);
>      Stream<T> sequential();

(Capsule summary of many, um, discussions between Brian and me:
I hate all of these. But not enough to act hatefully about them :-)

>      <U> Map<U, Collection<T>> groupBy(Mapper<? super T, ? extends U> classifier);
>      <U, W> Map<U, W> reduceBy(Mapper<? super T, ? extends U> classifier,
>                                Factory<W> baseFactory,
>                                Combiner<W, T, W> reducer);
> The most controversial signature here is groupBy, because it is the only place
> in the Streams API that is tied to Collections.

So why is this in Streams rather than in Maps?

> Don has suggested a multi-valued version of groupBy:
>      <U> Map<U, Collection<T>> groupByMulti(FlatMapper<? super T, ? extends U>
> classifier);
> which is easy to implement and makes sense to me.

The main argument against this is that at least in parallel designs,
it is vastly better to reduce the nested value collection while
it is being generated. There are surely cases where circumstances
don't let you do this, but it's a little uncomfortable to
support a method that you hope that people only rarely use.

> The reduceBy method is one of my favorites.  (Not sure if we have the signature
> quite right yet, it probably needs multiple versions.)  It is a combination of
> group-by and reduce-values.  So if you want to compute the highest score by person:
> Map<Name, Integer> bestScoresByPerson =
>    scores.reduceBy(s -> getName(),
>                    ()-> 0,
>                    (sc, s) -> max(sc, s.getScore());

(Right. better support for constructions like this were one of
the reasons I expanded lambda-accepting methods in CHM a few
months ago.)


More information about the lambda-libs-spec-observers mailing list